XR Adaptive Modality: Experiment Report

Author

Mohammad Dastgheib

Published

January 16, 2026

Follow-up Models: Gaze-only and Hand-only

Data Quality & Coverage Gate

Note

This section prevents misleading models when cells are missing. Current data: N=81 participants.

Participant counts. Note: 'df' refers to correct trials, RT ∈ [150, 6000] ms, non-practice.
Metric Count
Total participants (raw) 81
Participants with any valid trials 81
Participants in df (correct, RT-filtered) 81
Condition coverage (modality × ui_mode × pressure)
modality ui_mode pressure trials pids missing_cell Status
hand static 0 1998 73 FALSE OK
hand static 1 2038 75 FALSE OK
hand adaptive 0 2025 74 FALSE OK
hand adaptive 1 2052 75 FALSE OK
gaze static 0 2125 78 FALSE OK
gaze static 1 2158 79 FALSE OK
gaze adaptive 0 2182 80 FALSE OK
gaze adaptive 1 2133 78 FALSE OK
All factors have ≥2 levels in the data.
Blocks logged per participant
pid blocks_logged
P001 8
P002 7
P003 8
P004 8
P005 8
P006 8
P007 8
P008 8
P009 8
P010 8
P011 8
P012 8
P013 8
P014 8
P015 8
P016 8
P017 8
P018 8
P019 8
P020 8
P021 8
P022 8
P023 8
P024 8
P025 8
P026 8
P027 8
P028 8
P029 8
P030 4
P031 8
P032 8
P035 8
P036 8
P037 4
P038 8
P039 8
P040 8
P041 8
P042 8
P043 4
P045 8
P046 8
P047 8
P048 7
P049 8
P050 8
P051 8
P054 8
P055 8
P057 6
P058 8
P059 4
P060 8
P061 4
P062 8
P063 4
P064 8
P065 8
P066 8
P067 8
P068 8
P069 8
P070 8
P072 4
P073 5
P074 8
P075 8
P076 8
P077 8
P078 7
P079 8
P080 8
P081 8
P082 8
P083 8
P084 8
P085 8
P086 8
P087 8
P088 8

1. Executive Summary

This report analyzes 81 participants performing Fitts’ law pointing tasks across two input modalities (Hand, Gaze) and two UI modes (Static, Adaptive).

Results Snapshot (N = 81)


### Gaze: Adaptive vs Static (Primary Adaptive Test)
Gaze contrasts: adaptive - static (by pressure)
modality pressure tp_diff_adapt_static rt_diff_adapt_static err_diff_adapt_static
gaze 0 -0.0237599 0.0356422 -0.0005056
gaze 1 -0.1245809 0.0529007 -0.0156147

### Hand: Pressure Effect (UI Mode Not Exercised)
Hand contrasts: pressure ON - pressure OFF
modality tp_diff_p1_p0 rt_diff_p1_p0 err_diff_p1_p0
hand -0.0737157 0.0202401 0.000645

*Note:* Hand width inflation did not activate (width_scale_factor always 1); UI mode is not interpreted as an adaptive manipulation for hand. The gaze-only UI Mode × Pressure model is the primary test of adaptive vs static effects.
RQ2 snapshot: Overall TLX
modality ui_mode Mean_Overall_TLX
gaze adaptive 46.8
gaze static 45.8
hand adaptive 41.1
hand static 40.8
RQ3 manipulation check: width scaling
modality ui_mode Mean_Width_Scale Pct_Scaled
gaze adaptive 1 0
gaze static 1 0
hand adaptive 1 0
hand static 1 0

*Note:* Hand width inflation did not activate; all recorded `width_scale_factor` values equal 1.0. See root-cause diagnostic section for investigation of non-activation.

Key Findings

  • Total Trials Analyzed: 14953 valid trials (correct responses, RT 150-6000ms)
  • Total Trials Collected: 17442
  • Overall Error Rate: 14%
  • Mean Throughput: 3.36 bits/s (SD = 1.04)
  • Mean Movement Time: 1.147s (SD = 0.469s)

2. Demographics

Sample Size: N = 81 participants.

Overall Demographics

N Mean Age SD Age Age Range Mean Gaming (Hrs/Week) SD Gaming
81 30.1 8 18 - 62 1.5 4.3

By Gender

gender Count Avg Age SD Age Avg Gaming (Hrs)
female 36 30.3 8.7 0.1
male 45 30.0 7.5 2.6

Input Device Distribution

input_device Count Percentage
mouse 75 92.6
trackpad 6 7.4

Gaming Status

Participants were primarily non-gamers (median self-reported gaming =  0  hours/week; only  11.1 % reported ≥5 hrs/week).

3. Primary Analysis: Throughput

Research Question: Does the Adaptive UI improve performance (Throughput) compared to Static, especially for Gaze?

Sample Size: N = 75 participants for hand modality (mouse users only), N = 81 participants for gaze modality (mouse + trackpad users). See Data Quality section for input device exclusion rationale.

Analysis Note: We observe a large main effect of modality (hand > gaze) on throughput. Interaction effects are treated as exploratory.

Summary Statistics

Throughput (bits/s) by Condition (N = 81 participants)
modality ui_mode pressure N_participants N_observations Mean SD Median Q25 Q75
hand static 0 73 219 3.56 0.92 3.48 2.94 4.06
hand static 1 75 224 3.53 0.98 3.49 2.92 4.16
hand adaptive 0 74 222 3.59 0.96 3.61 2.86 4.25
hand adaptive 1 75 225 3.47 0.95 3.49 2.68 4.18
gaze static 0 77 227 3.23 1.06 3.10 2.49 3.80
gaze static 1 78 231 3.22 1.08 3.08 2.51 3.74
gaze adaptive 0 80 233 3.18 1.09 3.01 2.43 3.85
gaze adaptive 1 78 226 3.10 1.11 2.86 2.34 3.61

Visualizations

Throughput by Modality and UI Mode (participant-level means). N = 81 participants. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons.

Estimated Marginal Means for Throughput. N = 81 participants (shown only when model fits and factors exist).

Statistical Model Results

Planned Sample Size & Power

The throughput analysis was designed for a within-subjects 2×2×2 factorial (modality × UI mode × pressure). However, the HAND adaptive manipulation (width inflation) did not execute (width_scale_factor always 1), so UI mode is not interpretable as an adaptation for hand. The primary adaptive test is the gaze-only UI Mode × Pressure interaction, which evaluates whether declutter (the gaze adaptive manipulation that did execute) improves performance. Standard repeated-measures power calculations and guidelines (Cohen, 1988; Brysbaert, 2019) indicate that N ≈ 50 participants is sufficient for 80% power to detect dz ≈ 0.40. We therefore set N = 48 (six complete Williams sequences) as the primary design target, with the option to extend to N = 64 (eight sequences) if recruitment permits. Given the large number of trials per condition and the mixed-effects model (random intercepts per participant), this sample size is expected to provide high power for modality main effects and gaze-only adaptive effects, while the omnibus UI mode effect is diluted by hand non-manipulation and should be interpreted via the targeted gaze-only follow-up.

### Model:  TP ~ modality * ui_mode * pressure + (1 | pid) 

**Model Structure:** Random intercept only (participant-level random effects). The trial-level N (≈ 14953 ) should not be mistaken for independent units in discussions of power; the effective N for inference is the number of participants ( 81 ).

**Data Summary:**  81  participants,  14953  trials,  8  conditions, minimum  1715  trials per condition.

#### ANOVA Table
Type III Analysis of Variance Table with Satterthwaite's method
                          Sum Sq Mean Sq NumDF  DenDF F value Pr(>F)    
modality                  45.690  45.690     1 1743.3 69.7142 <2e-16 ***
ui_mode                    1.199   1.199     1 1726.8  1.8295 0.1764    
pressure                   1.699   1.699     1 1727.5  2.5920 0.1076    
modality:ui_mode           0.537   0.537     1 1727.0  0.8193 0.3655    
modality:pressure          0.025   0.025     1 1727.3  0.0388 0.8439    
ui_mode:pressure           0.788   0.788     1 1728.5  1.2028 0.2729    
modality:ui_mode:pressure  0.001   0.001     1 1727.4  0.0010 0.9753    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

**Analysis Note:** At N= 81 , 3-way interactions may be underpowered. Non-significant interaction effects should be treated as exploratory.


#### Model Summary
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: formula_tp$formula
   Data: df_tp_model
Control: lmerControl(optimizer = "bobyqa")

      AIC       BIC    logLik -2*log(L)  df.resid 
   4597.4    4652.4   -2288.7    4577.4      1797 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.5970 -0.7147 -0.0052  0.6421  4.4587 

Random effects:
 Groups   Name        Variance Std.Dev.
 pid      (Intercept) 0.3832   0.6190  
 Residual             0.6554   0.8096  
Number of obs: 1807, groups:  pid, 81

Fixed effects:
                               Estimate Std. Error         df t value Pr(>|t|)
(Intercept)                   3.344e+00  7.147e-02  8.119e+01  46.798   <2e-16
modality1                     1.622e-01  1.943e-02  1.743e+03   8.350   <2e-16
ui_mode1                      2.579e-02  1.907e-02  1.727e+03   1.353    0.176
pressure1                     3.072e-02  1.908e-02  1.727e+03   1.610    0.108
modality1:ui_mode1           -1.726e-02  1.907e-02  1.727e+03  -0.905    0.366
modality1:pressure1           3.757e-03  1.907e-02  1.727e+03   0.197    0.844
ui_mode1:pressure1           -2.095e-02  1.911e-02  1.729e+03  -1.097    0.273
modality1:ui_mode1:pressure1 -5.906e-04  1.908e-02  1.727e+03  -0.031    0.975
                                
(Intercept)                  ***
modality1                    ***
ui_mode1                        
pressure1                       
modality1:ui_mode1              
modality1:pressure1             
ui_mode1:pressure1              
modality1:ui_mode1:pressure1    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) mdlty1 ui_md1 prssr1 md1:_1 mdl1:1 u_m1:1
modality1    0.011                                          
ui_mode1     0.001  0.000                                   
pressure1    0.002  0.006  0.007                            
mdlty1:_md1  0.001  0.003  0.014 -0.004                     
mdlty1:prs1  0.003  0.002 -0.005  0.016  0.007              
u_md1:prss1  0.003 -0.005  0.003  0.006  0.007  0.004       
mdlty1:_1:1 -0.002  0.007  0.006  0.000  0.003  0.001  0.011

#### Written Results (APA Style)

**Modality Effect:** A linear mixed-effects model revealed a significant main effect of input modality on throughput, F(1, 1743.3) = 69.71, p < .001, η²p = 0.038 (small effect). 
Hand input produced higher throughput (M = 3.51, 95% CI [3.36, 3.66] bits/s) than gaze input (M = 3.18, 95% CI [3.03, 3.33] bits/s).

**UI Mode Effect (Omnibus):** The main effect of UI mode was non-significant, F(1, 1726.8) = 1.83, p = 0.176, η²p = 0.001 (negligible effect). **Note:** This omnibus UI mode effect is diluted by the fact that HAND width inflation did not execute (width_scale_factor always 1), so UI mode is not interpretable as an adaptive manipulation for hand. The interpretable test of adaptation comes from the **gaze-only UI Mode × Pressure follow-up model** below.

#### Gaze-Only Follow-up: UI Mode × Pressure (Primary Adaptive Test)

Type III Analysis of Variance Table with Satterthwaite's method
                  Sum Sq Mean Sq NumDF  DenDF F value Pr(>F)
ui_mode          1.20992 1.20992     1 838.12  2.0586 0.1517
pressure         0.77726 0.77726     1 836.92  1.3224 0.2505
ui_mode:pressure 0.36436 0.36436     1 841.02  0.6199 0.4313

**Estimated Marginal Means (Gaze-only):**


Table: Estimated Marginal Means for Throughput: Gaze-only

|UI Mode  |Pressure | Mean TP (bits/s)| 95% CI Lower| 95% CI Upper|
|:--------|:--------|----------------:|------------:|------------:|
|Static   |0        |             3.23|         3.03|         3.43|
|Adaptive |0        |             3.20|         3.00|         3.40|
|Static   |1        |             3.21|         3.02|         3.41|
|Adaptive |1        |             3.10|         2.90|         3.30|

**Key Contrasts (Gaze-only, Holm-adjusted):**


|contrast                              | estimate|    SE|      df| t.ratio| p.value|
|:-------------------------------------|--------:|-----:|-------:|-------:|-------:|
|static pressure0 - adaptive pressure0 |    0.033| 0.072| 842.509|   0.453|   1.000|
|static pressure0 - adaptive pressure1 |    0.131| 0.072| 840.480|   1.814|   0.421|
|adaptive pressure0 - static pressure1 |   -0.015| 0.072| 840.692|  -0.204|   1.000|
|static pressure1 - adaptive pressure1 |    0.113| 0.072| 842.799|   1.564|   0.591|

#### Hand-Only Follow-up: Pressure Effect

*Note:* UI mode is excluded from hand models by design because width scaling did not execute.

Type III Analysis of Variance Table with Satterthwaite's method
         Sum Sq Mean Sq NumDF  DenDF F value Pr(>F)
pressure 1.0731  1.0731     1 818.48  1.6205 0.2034

**Estimated Marginal Means (Hand-only):**


Table: Estimated Marginal Means for Throughput: Hand-only (pressure effect)

|Pressure | Mean TP (bits/s)| 95% CI Lower| 95% CI Upper|
|:--------|----------------:|------------:|------------:|
|0        |             3.57|         3.43|         3.71|
|1        |             3.50|         3.36|         3.64|

**Pressure Contrast (Hand-only, Holm-adjusted):**


|contrast              | estimate|    SE|      df| t.ratio| p.value|
|:---------------------|--------:|-----:|-------:|-------:|-------:|
|pressure0 - pressure1 |     0.07| 0.055| 819.347|   1.272|   0.204|

**Modality × UI Mode Interaction:** The interaction between modality and UI mode was non-significant, F(1, 1727.0) = 0.82, p = 0.366, η²p = 0.000 (negligible effect). This suggests that the effect of UI mode did not differ significantly between hand and gaze modalities.


#### Effect Size: Hand vs. Gaze (Collapsed Over UI Mode and Pressure)


Table: Estimated Marginal Means for Throughput by Modality (collapsed over UI mode and pressure)

|Modality | Mean TP (bits/s)| 95% CI Lower| 95% CI Upper|
|:--------|----------------:|------------:|------------:|
|Hand     |             3.51|         3.36|         3.66|
|Gaze     |             3.18|         3.03|         3.33|

**Difference (Hand - Gaze):**  0.32  bits/s


#### Pairwise Comparisons (Holm-adjusted)


Table: Pairwise Comparisons with Effect Sizes (Holm-adjusted p-values)

|contrast                                          |   estimate|        SE| Cohen's d (approx)|Effect Size |p-value |       df|
|:-------------------------------------------------|----------:|---------:|------------------:|:-----------|:-------|--------:|
|hand static pressure0 - gaze static pressure0     |  0.2962922| 0.0772791|              0.426|small       |= 0.002 | 1738.000|
|hand static pressure0 - hand adaptive pressure0   | -0.0260350| 0.0773102|             -0.037|negligible  |= 1.000 | 1733.571|
|hand static pressure0 - gaze adaptive pressure0   |  0.3416609| 0.0768298|              0.494|small       |< .001  | 1738.485|
|hand static pressure0 - hand static pressure1     |  0.0258562| 0.0772010|              0.037|negligible  |= 1.000 | 1734.303|
|hand static pressure0 - gaze static pressure1     |  0.3094844| 0.0770863|              0.446|small       |= 0.001 | 1739.357|
|hand static pressure0 - hand adaptive pressure1   |  0.0859997| 0.0771106|              0.124|negligible  |= 1.000 | 1734.249|
|hand static pressure0 - gaze adaptive pressure1   |  0.4363065| 0.0773386|              0.627|medium      |< .001  | 1737.768|
|gaze static pressure0 - hand adaptive pressure0   | -0.3223272| 0.0770623|             -0.465|small       |< .001  | 1738.545|
|gaze static pressure0 - gaze adaptive pressure0   |  0.0453687| 0.0758682|              0.066|negligible  |= 1.000 | 1735.084|
|gaze static pressure0 - hand static pressure1     | -0.2704360| 0.0769477|             -0.391|small       |= 0.007 | 1739.248|
|gaze static pressure0 - gaze static pressure1     |  0.0131922| 0.0760477|              0.019|negligible  |= 1.000 | 1735.250|
|gaze static pressure0 - hand adaptive pressure1   | -0.2102925| 0.0768566|             -0.304|small       |= 0.082 | 1739.212|
|gaze static pressure0 - gaze adaptive pressure1   |  0.1400143| 0.0763446|              0.204|small       |= 0.802 | 1733.915|
|hand adaptive pressure0 - gaze adaptive pressure0 |  0.3676959| 0.0765456|              0.534|medium      |< .001  | 1738.367|
|hand adaptive pressure0 - hand static pressure1   |  0.0518912| 0.0768864|              0.075|negligible  |= 1.000 | 1733.776|
|hand adaptive pressure0 - gaze static pressure1   |  0.3355194| 0.0767695|              0.486|small       |< .001  | 1738.908|
|hand adaptive pressure0 - hand adaptive pressure1 |  0.1120347| 0.0767962|              0.162|negligible  |= 1.000 | 1733.728|
|hand adaptive pressure0 - gaze adaptive pressure1 |  0.4623415| 0.0771220|              0.666|medium      |< .001  | 1738.313|
|gaze adaptive pressure0 - hand static pressure1   | -0.3158047| 0.0762918|             -0.460|small       |< .001  | 1737.543|
|gaze adaptive pressure0 - gaze static pressure1   | -0.0321765| 0.0754205|             -0.047|negligible  |= 1.000 | 1733.865|

4. Movement Time Analysis (Core Confirmatory)

Research Question: How does movement time vary across conditions?

This analysis is part of the core confirmatory battery for RQ1 and RQ3. Movement time is mathematically coupled with throughput (TP = ID/RT) and serves as a complementary performance metric.

Sample Size: N = 75 participants for hand modality (mouse users only), N = 81 participants for gaze modality (mouse + trackpad users).

Relationship to Throughput: The RT patterns mirror throughput: hand is faster than gaze. Adaptive vs static and pressure do not show robust main effects on movement time at this N, consistent with the TP results.

Summary Statistics

Movement Time (s) by Condition (N = 81 participants)
modality ui_mode pressure N_participants N_trials Mean SD Median
hand static 0 73 1961 1.086 0.425 1.007
hand static 1 75 2006 1.106 0.411 1.036
hand adaptive 0 74 1994 1.080 0.405 1.012
hand adaptive 1 75 2012 1.100 0.407 1.024
gaze static 0 78 1734 1.168 0.485 1.074
gaze static 1 78 1715 1.184 0.480 1.083
gaze adaptive 0 80 1782 1.235 0.579 1.093
gaze adaptive 1 78 1749 1.242 0.523 1.119

Visualizations

Movement Time by Modality and UI Mode (participant-level means). N = 81 participants. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons.

Estimated Marginal Means for Movement Time. N = 81 participants (shown only when model fits and factors exist).

Statistical Model Results

Planned Sample Size & Power

The log-RT analysis uses the same 2×2×2 within-subjects design and random-intercept LMM as the throughput analysis. However, the HAND adaptive manipulation (width inflation) did not execute (width_scale_factor always 1), so UI mode is not interpretable as an adaptation for hand. The primary adaptive test is the gaze-only UI Mode × Pressure interaction. Because throughput and RT are mathematically coupled (TP = ID/RT), the sample-size logic is identical: N = 48 is sufficient for detecting dz ≈ 0.40–0.50 differences with ≈0.80 power, and N = 64 further strengthens power for smaller effects and interactions (Cohen, 1988). Trial-level modeling with many repeated observations per participant increases precision, but our power planning is intentionally conservative and based on participant-level effects rather than naïvely counting trials.

Note on unbalanced design: Same as throughput analysis: hand modality N=0 (mouse users only), gaze modality N=0 (0 mouse + 0 trackpad users). Type III ANOVA with sum-to-zero contrasts handles this appropriately (Fox & Weisberg, 2019).

Random Effects Structure: All mixed models in this report use a random intercept for participants (1 | pid), which is a conservative and stable baseline. We may test richer random-effects structures (e.g., (1 + modality | pid)) as a robustness check.

### Model:  log_rt ~ modality * ui_mode * pressure + (1 | pid) 

**Model Structure:** Random intercept only (participant-level random effects). The trial-level N (≈ 14953 ) should not be mistaken for independent units in discussions of power; the effective N for inference is the number of participants ( 81 ).

**Data Summary:**  81  participants,  14953  trials,  8  conditions, minimum  1715  trials per condition.

#### ANOVA Table
Type III Analysis of Variance Table with Satterthwaite's method
                           Sum Sq Mean Sq NumDF DenDF  F value    Pr(>F)    
modality                  15.2141 15.2141     1 14904 177.9518 < 2.2e-16 ***
ui_mode                    1.2045  1.2045     1 14873  14.0890 0.0001750 ***
pressure                   1.0602  1.0602     1 14874  12.4007 0.0004305 ***
modality:ui_mode           1.3524  1.3524     1 14873  15.8183 7.006e-05 ***
modality:pressure          0.0375  0.0375     1 14874   0.4385 0.5078801    
ui_mode:pressure           0.0009  0.0009     1 14876   0.0100 0.9203074    
modality:ui_mode:pressure  0.0036  0.0036     1 14874   0.0418 0.8379702    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Written Results (APA Style)

**Modality Effect:** A linear mixed-effects model on log-transformed movement time revealed a significant main effect of input modality, F(1, 14904.1) = 177.95, p < .001, η²p = 0.012 (small effect). 
Hand input produced faster movement times (M = 6.950, 95% CI [6.908, 6.991] s) than gaze input (M = 7.015, 95% CI [6.973, 7.057] s).

**UI Mode Effect (Omnibus):** The main effect of UI mode on movement time was significant, F(1, 14873.2) = 14.09, p < .001, η²p = 0.001 (negligible effect). **Note:** This omnibus UI mode effect is diluted by the fact that HAND width inflation did not execute, so UI mode is not interpretable as an adaptive manipulation for hand. The interpretable test of adaptation comes from the **gaze-only UI Mode × Pressure follow-up model** below.

#### Gaze-Only Follow-up: UI Mode × Pressure (Primary Adaptive Test)

Type III Analysis of Variance Table with Satterthwaite's method
                  Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
ui_mode          1.76884 1.76884     1 6902.6 21.4673 3.665e-06 ***
pressure         0.29054 0.29054     1 6900.2  3.5262   0.06045 .  
ui_mode:pressure 0.00146 0.00146     1 6907.4  0.0178   0.89397    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

**Estimated Marginal Means (Gaze-only):**


Table: Estimated Marginal Means for Movement Time: Gaze-only

|UI Mode  |Pressure | Mean RT (s)| 95% CI Lower| 95% CI Upper|
|:--------|:--------|-----------:|------------:|------------:|
|Static   |0        |       6.987|        6.931|        7.044|
|Adaptive |0        |       7.019|        6.962|        7.075|
|Static   |1        |       6.999|        6.943|        7.056|
|Adaptive |1        |       7.032|        6.976|        7.089|

**Key Contrasts (Gaze-only, Holm-adjusted):**


|contrast                              | estimate|   SE|  df| z.ratio| p.value|
|:-------------------------------------|--------:|----:|---:|-------:|-------:|
|static pressure0 - adaptive pressure0 |   -0.031| 0.01| Inf|  -3.186|   0.006|
|static pressure0 - adaptive pressure1 |   -0.045| 0.01| Inf|  -4.599|   0.000|
|adaptive pressure0 - static pressure1 |    0.019| 0.01| Inf|   1.958|   0.151|
|static pressure1 - adaptive pressure1 |   -0.033| 0.01| Inf|  -3.348|   0.004|

#### Hand-Only Follow-up: Pressure Effect

*Note:* UI mode is excluded from hand models by design because width scaling did not execute.

Type III Analysis of Variance Table with Satterthwaite's method
          Sum Sq Mean Sq NumDF  DenDF F value   Pr(>F)   
pressure 0.81498 0.81498     1 7905.1  10.277 0.001352 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

**Estimated Marginal Means (Hand-only):**


Table: Estimated Marginal Means for Movement Time: Hand-only (pressure effect)

|Pressure | Mean RT (s)| 95% CI Lower| 95% CI Upper|
|:--------|-----------:|------------:|------------:|
|0        |       6.932|        6.897|        6.968|
|1        |       6.953|        6.917|        6.988|

**Pressure Contrast (Hand-only, Holm-adjusted):**


|contrast              | estimate|    SE|  df| z.ratio| p.value|
|:---------------------|--------:|-----:|---:|-------:|-------:|
|pressure0 - pressure1 |    -0.02| 0.006| Inf|  -3.206|   0.001|

**Modality × UI Mode Interaction:** The interaction was significant, F(1, 14873.3) = 15.82, p < .001, η²p = 0.001 (negligible effect). Follow-up simple effects analyses are recommended.


#### Pairwise Comparisons (Holm-adjusted)
 contrast                                             estimate          SE  df
 hand static pressure0 - gaze static pressure0     -0.05063470 0.009704014 Inf
 hand static pressure0 - hand adaptive pressure0   -0.00038816 0.009307398 Inf
 hand static pressure0 - gaze adaptive pressure0   -0.08723914 0.009647232 Inf
 hand static pressure0 - hand static pressure1     -0.02154994 0.009303428 Inf
 hand static pressure0 - gaze static pressure1     -0.06386266 0.009752761 Inf
 hand static pressure0 - hand adaptive pressure1   -0.01901221 0.009293782 Inf
 hand static pressure0 - gaze adaptive pressure1   -0.10146835 0.009685099 Inf
 gaze static pressure0 - hand adaptive pressure0    0.05024653 0.009671825 Inf
 gaze static pressure0 - gaze adaptive pressure0   -0.03660444 0.009891420 Inf
 gaze static pressure0 - hand static pressure1      0.02908476 0.009666402 Inf
 gaze static pressure0 - gaze static pressure1     -0.01322796 0.009987095 Inf
 gaze static pressure0 - hand adaptive pressure1    0.03162249 0.009659086 Inf
 gaze static pressure0 - gaze adaptive pressure1   -0.05083365 0.009928210 Inf
 hand adaptive pressure0 - gaze adaptive pressure0 -0.08685098 0.009606228 Inf
 hand adaptive pressure0 - hand static pressure1   -0.02116177 0.009257010 Inf
 hand adaptive pressure0 - gaze static pressure1   -0.06347449 0.009711027 Inf
 hand adaptive pressure0 - hand adaptive pressure1 -0.01862405 0.009247490 Inf
 hand adaptive pressure0 - gaze adaptive pressure1 -0.10108019 0.009654203 Inf
 gaze adaptive pressure0 - hand static pressure1    0.06568920 0.009579917 Inf
 gaze adaptive pressure0 - gaze static pressure1    0.02337648 0.009903533 Inf
 z.ratio p.value
  -5.218 <0.0001
  -0.042  1.0000
  -9.043 <0.0001
  -2.316  0.1643
  -6.548 <0.0001
  -2.046  0.2447
 -10.477 <0.0001
   5.195 <0.0001
  -3.701  0.0026
   3.009  0.0262
  -1.325  0.5968
   3.274  0.0117
  -5.120 <0.0001
  -9.041 <0.0001
  -2.286  0.1643
  -6.536 <0.0001
  -2.014  0.2447
 -10.470 <0.0001
   6.857 <0.0001
   2.360  0.1643

Degrees-of-freedom method: asymptotic 
P value adjustment: holm method for 28 tests 

5. Fitts’ Law Modelling

Research Question: How well does the data fit Fitts’ Law? (Linearity check).

Planned Sample Size & Power

Fitts’ law analyses serve primarily to validate the pointing task and modality differences, not to test the core adaptation hypotheses. The ID effect on movement time is typically very large (R² > .70), and robust Fitts-law slopes are observable with as few as 10–20 participants in classic HCI work. In this study, any final sample N ≥ 30 is more than sufficient for stable ID slopes; our planned N = 48 places this analysis in an over-powered, descriptive regime. We therefore do not perform formal power calculations here and treat Fitts regression as a manipulation check and descriptive characterization of the dataset.

Sample Size: N = 75 participants for hand modality (mouse users only), N = 81 participants for gaze modality (mouse + trackpad users).

Flatter slopes indicate less sensitivity to difficulty (ballistic movement).

Fitts Law Regression (Movement Time vs Effective Index of Difficulty). N = 81 participants. The effective index of difficulty (IDe) is calculated using the effective target width (We) derived from the spatial distribution of selection endpoints. Shaded regions around regression lines represent 95% confidence intervals. Linear regression fits are shown separately for each modality and UI mode combination.

### Model Fit Statistics
Linear Regression: MT ~ IDe (N = 81 participants)
modality ui_mode r_squared slope intercept
hand static 0.492 0.152 0.505
hand adaptive 0.470 0.142 0.544
gaze static 0.299 0.173 0.558
gaze adaptive 0.263 0.190 0.548

6. Error Rate Analysis (Core Confirmatory)

Research Question: How do error rates differ across conditions?

This analysis is part of the core confirmatory battery for RQ1 and RQ3.

Sample Size: N = 75 participants for hand modality (mouse users only), N = 81 participants for gaze modality (mouse + trackpad users).

Error Rates by Condition (N = 81 participants)
modality ui_mode pressure Participants Mean_Error_Rate SD_Error_Rate
hand static 0 73 1.88 4.83
hand static 1 75 1.58 4.05
hand adaptive 0 74 1.55 4.71
hand adaptive 1 75 1.98 4.69
gaze static 0 78 18.45 13.92
gaze static 1 78 19.65 10.81
gaze adaptive 0 80 18.40 12.86
gaze adaptive 1 78 18.09 12.61

Error Rate by Modality and UI Mode (participant-level means). N = 81 participants. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons.


**Error Rate Summary:** Overall error rate was  10.5 %. Errors were concentrated in gaze conditions ( 18.8 %), while hand remained near  1.7 %.

Error Rate by Modality and UI Mode (participant-level means). N = 81 participants. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons.

Statistical Model Results

Planned Sample Size & Power

For the error-rate analysis we fit a binomial GLMM with random intercepts per participant. However, the HAND adaptive manipulation (width inflation) did not execute (width_scale_factor always 1), so UI mode is not interpretable as an adaptation for hand. The primary adaptive test is the gaze-only UI Mode × Pressure interaction. We expect odds-ratio effects in the small-to-medium range (e.g., OR ≈ 0.7–0.8 for adaptive vs static in gaze, and OR ≈ 2–3 for gaze vs hand). Binary outcomes with relatively low error rates (≈10–15%) typically require more participants than continuous outcomes for stable mixed-effects estimation (Kumle et al., 2021). For this analysis, we therefore treat N = 64 as a “good” target that yields comfortable power for medium effects, while N = 48 remains adequate but somewhat less stable, especially for interaction terms and rare error types. Error-based interaction effects are interpreted as exploratory, even at N = 64.

### Model:  error ~ modality * ui_mode * pressure + (1 | pid) 

**Model Structure:** Random intercept only (participant-level random effects). The trial-level N (≈ 16711 ) should not be mistaken for independent units in discussions of power; the effective N for inference is the number of participants ( 81 ).

**Data Summary:**  81  participants,  16711  trials,  8  conditions, minimum  1998  trials per condition.
**Overall Error Rate:**  10.5 %
#### ANOVA Table (Type III for unbalanced design)
Analysis of Deviance Table (Type III Wald chisquare tests)

Response: error
                              Chisq Df Pr(>Chisq)    
(Intercept)               1003.8308  1     <2e-16 ***
modality                   865.9770  1     <2e-16 ***
ui_mode                      0.2190  1     0.6398    
pressure                     0.2232  1     0.6366    
modality:ui_mode             0.2895  1     0.5906    
modality:pressure            0.0014  1     0.9705    
ui_mode:pressure             0.5398  1     0.4625    
modality:ui_mode:pressure    2.5821  1     0.1081    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Pairwise Comparisons (Omnibus, Holm-adjusted)
*Note:* Omnibus UI mode effects are diluted by hand non-manipulation. See gaze-only follow-up below.

 contrast                                          odds.ratio       SE  df null
 hand static pressure0 / gaze static pressure0       0.072752 0.012976 Inf    1
 hand static pressure0 / hand adaptive pressure0     1.230786 0.305880 Inf    1
 hand static pressure0 / gaze adaptive pressure0     0.073652 0.013129 Inf    1
 hand static pressure0 / hand static pressure1       1.190374 0.293420 Inf    1
 hand static pressure0 / gaze static pressure1       0.064129 0.011392 Inf    1
 hand static pressure0 / hand adaptive pressure1     0.954876 0.222743 Inf    1
 hand static pressure0 / gaze adaptive pressure1     0.076127 0.013581 Inf    1
 gaze static pressure0 / hand adaptive pressure0    16.917569 3.259856 Inf    1
 gaze static pressure0 / gaze adaptive pressure0     1.012366 0.082995 Inf    1
 gaze static pressure0 / hand static pressure1      16.362097 3.109667 Inf    1
 gaze static pressure0 / gaze static pressure1       0.881475 0.071005 Inf    1
 gaze static pressure0 / hand adaptive pressure1    13.125087 2.264219 Inf    1
 gaze static pressure0 / gaze adaptive pressure1     1.046398 0.086192 Inf    1
 hand adaptive pressure0 / gaze adaptive pressure0   0.059841 0.011516 Inf    1
 hand adaptive pressure0 / hand static pressure1     0.967166 0.248472 Inf    1
 hand adaptive pressure0 / gaze static pressure1     0.052104 0.009998 Inf    1
 hand adaptive pressure0 / hand adaptive pressure1   0.775826 0.189496 Inf    1
 hand adaptive pressure0 / gaze adaptive pressure1   0.061853 0.011921 Inf    1
 gaze adaptive pressure0 / hand static pressure1    16.162232 3.065616 Inf    1
 gaze adaptive pressure0 / gaze static pressure1     0.870708 0.069299 Inf    1
 z.ratio p.value
 -14.694 <0.0001
   0.836  1.0000
 -14.633 <0.0001
   0.707  1.0000
 -15.463 <0.0001
  -0.198  1.0000
 -14.436 <0.0001
  14.678 <0.0001
   0.150  1.0000
  14.706 <0.0001
  -1.566  1.0000
  14.924 <0.0001
   0.551  1.0000
 -14.633 <0.0001
  -0.130  1.0000
 -15.398 <0.0001
  -1.039  1.0000
 -14.440 <0.0001
  14.671 <0.0001
  -1.740  0.9013

P value adjustment: holm method for 28 tests 
Tests are performed on the log odds ratio scale 

#### Gaze-Only Follow-up: UI Mode × Pressure (Primary Adaptive Test)

**ANOVA (Type III):**
Analysis of Deviance Table (Type III Wald chisquare tests)

Response: error
                    Chisq Df Pr(>Chisq)    
(Intercept)      360.4270  1     <2e-16 ***
ui_mode            2.4509  1     0.1175    
pressure           0.7239  1     0.3949    
ui_mode:pressure   2.0557  1     0.1516    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

**Estimated Marginal Means (Gaze-only, response scale):**


Table: Estimated Marginal Means for Error Rate: Gaze-only (UI Mode × Pressure)

|UI Mode  |Pressure | Mean Error Rate| 95% CI Lower| 95% CI Upper|
|:--------|:--------|---------------:|------------:|------------:|
|Static   |0        |            16.6|         14.2|         19.5|
|Adaptive |0        |            16.6|         14.1|         19.4|
|Static   |1        |            18.5|         15.9|         21.6|
|Adaptive |1        |            16.1|         13.7|         18.9|

**Key Contrasts (Gaze-only, Holm-adjusted, odds ratio scale):**


|contrast                              | odds.ratio|    SE|  df| null| z.ratio| p.value|
|:-------------------------------------|----------:|-----:|---:|----:|-------:|-------:|
|static pressure0 / adaptive pressure0 |      1.007| 0.082| Inf|    1|   0.082|   1.000|
|static pressure0 / adaptive pressure1 |      1.042| 0.085| Inf|    1|   0.497|   1.000|
|adaptive pressure0 / static pressure1 |      0.871| 0.069| Inf|    1|  -1.738|   0.411|
|static pressure1 / adaptive pressure1 |      1.187| 0.095| Inf|    1|   2.136|   0.196|

#### Hand-Only Follow-up: Pressure Effect

*Note:* UI mode is excluded from hand models by design because width scaling did not execute.

**ANOVA (Type III):**
Analysis of Deviance Table (Type III Wald chisquare tests)

Response: error
              Chisq Df Pr(>Chisq)    
(Intercept) 59.3148  1  1.344e-14 ***
pressure     0.0429  1      0.836    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

**Estimated Marginal Means (Hand-only, response scale):**


Table: Estimated Marginal Means for Error Rate: Hand-only (pressure effect)

|Pressure | Mean Error Rate| 95% CI Lower| 95% CI Upper|
|:--------|---------------:|------------:|------------:|
|0        |             0.1|            0|          0.5|
|1        |             0.1|            0|          0.5|

**Pressure Contrast (Hand-only, Holm-adjusted, odds ratio scale):**


|contrast              | odds.ratio|    SE|  df| null| z.ratio| p.value|
|:---------------------|----------:|-----:|---:|----:|-------:|-------:|
|pressure0 / pressure1 |      0.964| 0.173| Inf|    1|  -0.207|   0.836|

7. Accuracy & Gaze Dynamics

Sample Size: N = 75 participants for hand modality (mouse users only), N = 81 participants for gaze modality (mouse + trackpad users).

Effective Width (\(W_e\))

Planned Sample Size & Power

Effective width (We) is analyzed at the participant × condition level with a Gaussian LMM. We expect medium effects of modality (gaze > hand) and small-to-medium effects of UI mode (adaptive slightly improving spatial precision). For within-subject effects of this magnitude, N ≈ 48 is sufficient for ≈0.80 power (dz ≈ 0.4–0.5) according to standard repeated-measures power guidelines (Cohen, 1988). We therefore treat N = 48 as a good target for We, with N = 64 mainly helping if UI-mode effects turn out closer to dz ≈ 0.3.

Lower \(W_e\) indicates tighter shot grouping (higher precision).

Effective Width (px) by Condition (N = 81 participants)
modality ui_mode pressure N_participants Mean_We SD_We
hand static 0 73 33.68 20.76
hand static 1 75 33.13 20.49
hand adaptive 0 74 33.46 20.98
hand adaptive 1 75 34.75 21.45
gaze static 0 77 35.84 19.60
gaze static 1 78 35.72 19.77
gaze adaptive 0 80 34.76 18.93
gaze adaptive 1 78 36.42 20.10

Effective target width was broadly similar between Static and Adaptive within each modality; gaze showed slightly larger We overall, consistent with higher variability in endpoint location.

Effective Target Width (Accuracy) by Modality and UI Mode. N = 81 participants. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons. Lower values indicate tighter shot grouping and higher precision.

Statistical Analysis: Effective Width

### ANOVA: Effective Width (Type III)



Table: Mixed-effects model: We ~ Modality × UI Mode × Pressure (N = 81 participants)

|                          |    Sum Sq|   Mean Sq| NumDF|    DenDF| F value| Pr(>F)|
|:-------------------------|---------:|---------:|-----:|--------:|-------:|------:|
|modality                  | 1703.1717| 1703.1717|     1| 1783.168|  4.1869| 0.0409|
|ui_mode                   |   29.8092|   29.8092|     1| 1726.387|  0.0733| 0.7867|
|pressure                  |  137.8330|  137.8330|     1| 1729.079|  0.3388| 0.5606|
|modality:ui_mode          |   85.6434|   85.6434|     1| 1727.324|  0.2105| 0.6464|
|modality:pressure         |   16.6078|   16.6078|     1| 1728.095|  0.0408| 0.8399|
|ui_mode:pressure          |  381.2582|  381.2582|     1| 1735.188|  0.9372| 0.3331|
|modality:ui_mode:pressure |    0.0094|    0.0094|     1| 1728.993|  0.0000| 0.9962|

**Modality effect:**   

**UI mode effect:**   

**Modality × UI mode interaction:**   

### Estimated Marginal Means (by Modality × Pressure)



Table: Effective Width (px) by Modality, UI Mode, and Pressure

|ui_mode  |modality |pressure | emmean|   SE|      df| lower.CL| upper.CL|
|:--------|:--------|:--------|------:|----:|-------:|--------:|--------:|
|static   |hand     |0        |  33.69| 1.38| 1398.11|    30.98|    36.41|
|adaptive |hand     |0        |  33.46| 1.37| 1395.54|    30.77|    36.16|
|static   |gaze     |0        |  35.88| 1.36| 1402.92|    33.22|    38.54|
|adaptive |gaze     |0        |  34.79| 1.34| 1403.27|    32.16|    37.42|
|static   |hand     |1        |  33.13| 1.37| 1396.20|    30.45|    35.81|
|adaptive |hand     |1        |  34.75| 1.36| 1393.31|    32.07|    37.42|
|static   |gaze     |1        |  35.71| 1.35| 1398.17|    33.07|    38.35|
|adaptive |gaze     |1        |  36.45| 1.36| 1411.59|    33.78|    39.11|
### Pairwise Comparisons (Holm-adjusted)



Table: UI Mode comparisons within each Modality

|contrast          | estimate|    SE| t-ratio|p-value |       df|
|:-----------------|--------:|-----:|-------:|:-------|--------:|
|static - adaptive |    0.231| 1.921|    0.12|= 0.904 | 1723.504|
|static - adaptive |    1.093| 1.882|    0.58|= 0.561 | 1734.248|
|static - adaptive |   -1.616| 1.904|   -0.85|= 0.396 | 1720.903|
|static - adaptive |   -0.736| 1.888|   -0.39|= 0.697 | 1735.628|

**APA-formatted summary (omnibus):** No significant differences in effective width between UI modes within modalities (all p > 0.05).

#### Gaze-Only Follow-up: UI Mode × Pressure

**Estimated Marginal Means (Gaze-only):**


Table: Effective Width: Gaze-only (UI Mode × Pressure)

|UI Mode  |Pressure | Mean We (px)|
|:--------|:--------|------------:|
|Static   |0        |        35.84|
|Adaptive |0        |        34.76|
|Static   |1        |        35.72|
|Adaptive |1        |        36.42|

**Key Contrasts (Gaze-only, Holm-adjusted):**


|contrast                              | estimate|    SE|      df| t.ratio| p.value|
|:-------------------------------------|--------:|-----:|-------:|-------:|-------:|
|static pressure0 - adaptive pressure0 |    1.080| 1.828| 849.096|   0.591|       1|
|static pressure0 - adaptive pressure1 |   -0.580| 1.842| 844.157|  -0.315|       1|
|adaptive pressure0 - static pressure1 |   -0.961| 1.820| 842.471|  -0.528|       1|
|static pressure1 - adaptive pressure1 |   -0.700| 1.834| 850.539|  -0.381|       1|

#### Hand-Only Follow-up: Pressure Effect

*Note:* UI mode is excluded from hand models by design because width scaling did not execute.

**Estimated Marginal Means (Hand-only):**


Table: Effective Width: Hand-only (pressure effect)

|Pressure | Mean We (px)|
|:--------|------------:|
|0        |        33.57|
|1        |        33.94|

**Pressure Contrast (Hand-only, Holm-adjusted):**


|contrast              | estimate|    SE|      df| t.ratio| p.value|
|:---------------------|--------:|-----:|-------:|-------:|-------:|
|pressure0 - pressure1 |   -0.368| 1.402| 821.242|  -0.263|   0.793|

Endpoint Accuracy Scatter Plot

Visualization of endpoint errors relative to target center. Each point represents one trial’s endpoint position.

Endpoint Accuracy Scatter Plot for Gaze Modality. N = 81 participants. Each point represents one trial endpoint position relative to the target center (0,0). The red dashed circle shows the approximate target size. Points closer to the center indicate better accuracy. Dotted lines indicate zero error in X and Y directions. Faceted by pressure condition.
Endpoint Error Distance (px) for Gaze Modality
ui_mode pressure N Mean_Error SD_Error Median_Error
static 0 1734 11.45 7.81 9.37
static 1 1715 11.75 7.85 9.73
adaptive 0 1782 11.66 7.96 9.53
adaptive 1 1749 11.77 8.04 9.78

The “Midas Touch” Struggle

Planned Sample Size & Power

Target re-entries are count-like and somewhat noisy, but we again analyze participant-level averages with an LMM (or, if needed, a Poisson GLMM). We anticipate medium modality effects (more re-entries for gaze) and small-to-medium UI-mode effects (fewer re-entries under adaptation). Given the noisier nature of this metric, a slightly larger sample is desirable if you want to treat it as confirmatory. We therefore treat N = 48 as adequate but exploratory and N = 64 as a “good” sample size for detecting medium within-subject effects in re-entry counts. Power reasoning follows the same logic as other continuous repeated-measures outcomes, tempered by mixed-model guidance from Kumle et al. (2021).

Target Re-entries measure how often the cursor drifted out of the target before selection.

Re-entries are interpreted here as a proxy for control stability; higher counts suggest more corrective movements. We will revisit this metric in the control-theory analyses (Section 10).

Target Re-entries by Condition (N = 81 participants)
modality ui_mode pressure N_participants Mean_Reentries SD_Reentries
hand static 0 73 0.86 0.55
hand static 1 75 0.85 0.55
hand adaptive 0 74 0.87 0.55
hand adaptive 1 75 0.85 0.57
gaze static 0 77 2.19 1.34
gaze static 1 78 2.19 1.33
gaze adaptive 0 80 2.26 1.45
gaze adaptive 1 78 2.18 1.32

Target Re-entries (Control Stability) by Modality and UI Mode. N = 81 participants. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons. Lower values are better.

8. Workload (NASA-TLX) (Core Confirmatory)

Subjective workload scores (lower is better).

Research Question: How does subjective workload differ across conditions? Does Adaptive UI reduce workload?

This analysis is part of the core confirmatory battery for RQ2 and RQ3.

Metric Definition: We use the unweighted NASA-TLX, computed as the mean of the six subscales (Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, Frustration). Each subscale is rated on a 0-100 scale, and the overall TLX score is the arithmetic mean of all six subscales. Lower values indicate lower subjective workload.

Sample Size: N = 75 participants for hand modality (mouse users only), N = 75 participants for gaze modality (mouse + trackpad users).

NASA-TLX Workload Scores by Modality and UI Mode. N = 75 participants. Scores range from 0-100, where lower values indicate lower subjective workload. The six TLX scales (Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, Frustration) are shown separately. White diamonds show mean values. Violin plots show the distribution shape, boxplots show quartiles.

NASA-TLX Workload Components (Stacked Bar Chart). N = 75 participants. Total height represents overall workload, with each colored segment representing one of the six TLX scales (Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, Frustration). Lower total height indicates lower overall subjective workload.

Statistical Model: Overall TLX

Planned Sample Size & Power

NASA-TLX scores (overall and subscales) are collected at the block level and analyzed with an LMM (random intercepts per participant; fixed effects for modality and UI mode). TLX scores tend to be reasonably reliable, and we expect medium effects for both modality (gaze > hand) and UI mode (adaptive < static), especially on Physical Demand and Frustration. For within-subject designs with medium effects, ≈40–50 participants typically provide ≥0.80 power (Brysbaert, 2019). We therefore treat N = 48 as a good, pre-planned N for TLX analyses. An increase to N = 64 would mostly refine confidence intervals and interaction estimates rather than change the main power conclusions.

Note on unbalanced design: Same as other analyses: hand modality N=70 (mouse users only), gaze modality N=75 (70 mouse + 5 trackpad users). Type III ANOVA with sum-to-zero contrasts handles this appropriately (Fox & Weisberg, 2019).

Random Effects Structure: All mixed models in this report use a random intercept for participants (1 | pid), which is a conservative and stable baseline. We may test richer random-effects structures (e.g., (1 + modality | pid)) as a robustness check.

### Model: overall_tlx ~ modality * ui_mode + (1 | pid)

**Data Summary:**  81  participants,  763  observations.

#### ANOVA Table
Type III Analysis of Variance Table with Satterthwaite's method
                 Sum Sq Mean Sq NumDF  DenDF F value Pr(>F)    
modality         4346.2  4346.2     1 550.02 73.9854 <2e-16 ***
ui_mode            75.0    75.0     1 550.19  1.2770 0.2590    
modality:ui_mode  118.4   118.4     1 550.10  2.0157 0.1562    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Written Results (APA Style)

**Modality Effect:** A linear mixed-effects model revealed a significant main effect of input modality on overall NASA-TLX workload, F(1, 550.0) = 73.99, p < .001, η²p = 0.119 (medium effect). 
Gaze input produced higher workload (M = 46.5, 95% CI [43.2, 49.8]) than hand input (M = 41.1, 95% CI [37.8, 44.4]).

**UI Mode Effect (Omnibus):** The main effect of UI mode on workload was non-significant, F(1, 550.2) = 1.28, p = 0.259, η²p = 0.002 (negligible effect). **Note:** This omnibus UI mode effect is diluted by the fact that HAND width inflation did not execute, so UI mode is not interpretable as an adaptive manipulation for hand. The interpretable test of adaptation comes from the **gaze-only UI Mode follow-up model** below.

#### Gaze-Only Follow-up: UI Mode (Primary Adaptive Test)

Type III Analysis of Variance Table with Satterthwaite's method
        Sum Sq Mean Sq NumDF  DenDF F value  Pr(>F)  
ui_mode    170     170     1 240.66  3.6389 0.05763 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

**Estimated Marginal Means (Gaze-only):**


Table: Estimated Marginal Means for Overall TLX: Gaze-only

|UI Mode  | Mean TLX| 95% CI Lower| 95% CI Upper|
|:--------|--------:|------------:|------------:|
|Static   |     45.4|         42.1|         48.6|
|Adaptive |     46.9|         43.5|         50.2|

**Contrast (Gaze-only, Holm-adjusted):**


|contrast          | estimate|    SE|      df| t.ratio| p.value|
|:-----------------|--------:|-----:|-------:|-------:|-------:|
|static - adaptive |   -1.508| 0.792| 242.818|  -1.903|   0.058|

**Modality × UI Mode Interaction:** The interaction was non-significant, F(1, 550.1) = 2.02, p = 0.156, η²p = 0.004 (negligible effect). The effect of UI mode on workload did not differ significantly between modalities.


#### Estimated Marginal Means (Overall TLX by Modality × UI Mode)


Table: Estimated Marginal Means for Overall TLX by Condition (95% CI)

|Modality |UI Mode  | Mean TLX| 95% CI Lower| 95% CI Upper|
|:--------|:--------|--------:|------------:|------------:|
|Hand     |Static   |     41.2|         37.8|         44.5|
|Gaze     |Static   |     45.7|         42.3|         49.0|
|Hand     |Adaptive |     41.0|         37.5|         44.4|
|Gaze     |Adaptive |     47.3|         43.8|         50.7|

Advanced TLX Analysis: UX Insights

Research Questions: - Which subscales drive overall workload? Are there different workload profiles for hand vs. gaze? - Is there a performance-workload trade-off? Do participants who report lower workload perform better? - How do workload sources differ between modalities?

These analyses provide deeper UX insights into workload patterns and their relationship to performance.

### Workload Profile Analysis


**Dominant Workload Sources:**


### Workload-Performance Relationship


### Modality-Specific Workload Patterns

**Workload Differences: Hand vs. Gaze (by UI Mode)**


### Individual Differences in Workload

**Participants with Highest Average Workload (Top 5):**


**Participants with Lowest Average Workload (Bottom 5):**


**Workload Consistency Across Conditions:**
Participants with low SD report similar workload across all conditions.
Participants with high SD show large workload differences between conditions.


**Interpretation:**
- **Subscale contributions** reveal which aspects of workload are most prominent in each condition.
- **Workload-performance correlations** show whether lower workload is associated with better performance.
- **Workload efficiency** quantifies performance per unit of reported workload.
- **Modality differences** highlight which subscales are most affected by input modality.
- **Individual differences** identify participants who consistently report high/low workload.

9. Participant Awareness & Strategy (Debrief Analysis)

Research Question: Did participants notice the adaptive interface? Did they change their strategy? How do awareness and strategy relate to performance?

Sample Size: N = 41 participants with debrief responses.

This section analyzes post-experiment debrief responses to understand participant awareness of the adaptive interface and self-reported strategy changes.

**Debrief Response Coverage:**  41  participants provided debrief responses.

### Thematic Analysis of Debrief Responses

**Q1: Did you notice the interface adapting?**


**Sample Responses by Category:**


** Not Noticed :**
- "I noticed size differences, but no dynamic growth. I did not notice any text change."
- "I saw that the interface and targets changed, but I did not realize that it was a result of my performance."
- "No"

** Noticed Size Changes :**
- "sometimes they got bigger"
- "During Gaze Mode it was hard to track the target due to bloom (circles) appearing continuously as I tried to hit the targets."
- "I thought changes were random."

** Noticed :**
- "Yes"
- "Yes the space bar was harder to use."

** Other/Unclear :**
- "a little."
- "I had to scroll laterally slightly for some to fir the time box in the screen"


**Q2: Did you change your strategy during the experiment?**


**Sample Responses by Category:**


** No Strategy Change :**
- "Regardless of trial, I attempted to click the designated target as quickly as possible; no conscious change to technique."
- "No"
- "it was almost halfway through the experiment when I  noticed the right side bar provides feedback on how fast or slow I'm going, and I began to peek a..."

** Focused on Speed :**
- "When the targets became easier, I focused less on accuracy and focused more on speed, as the larger targets required less focus and accuracy."
- "easy targets made me faster."
- "I went faster when the targets were bigger because I was less worried about making a mistake."

** Adapted to Easier Targets :**
- "I proceeded more slowly when the target shrank in size."
- "When targets became smaller, I would slow down because it's harder to get the placement inside a smaller radius."
- "When the target gets smaller, I pay more attention to clicking in the right spot."

** Strategy Changed :**
- "Yes"
- "yes, Instead of focusing on the red dots, I focused on pressing the space when the "press space" button showed up"
- "I changed my strategy to move cursor as fast as possible in space clicking tasks, because it was easier to use both hands instead of using just mouse ..."

** Other/Unclear :**
- "I had to slow up with the shaking dot."

** Focused on Accuracy :**
- "I did go a bit slower for smaller targets. I would move the mouse quickly to their general area then slow down and focus onto it"


### Relationship to Performance

**Performance by Adaptation Awareness:**


**Performance by Strategy Category:**


**Interpretation:**
- Participants who noticed adaptation may have different performance patterns.
- Strategy changes (e.g., focusing on speed vs. accuracy) may relate to performance outcomes.
- These relationships are exploratory and should be interpreted with caution due to self-report biases.

10. Learning Curves & Practice Effects

Research Question: How does performance change within each condition? Do learning rates differ by condition?

Sample Size: N = 81 participants with trial-level data.

Note: These learning curves serve as a quality check that participants improved modestly and reached a plateau; we do not treat these as primary inferential outcomes. This analysis is exploratory/QC only.

This section shows learning curves aligned by condition start (accounting for Williams counterbalancing). For block-level trends, see Section 12.

Learning Curve Data Summary by Condition (N = 81 participants)
Modality UI Mode Pressure N Positions Mean RT (s) Mean Error Rate
Hand Static OFF 27 1.087 0.0188
Hand Static ON 27 1.106 0.0159
Hand Adaptive OFF 27 1.081 0.0155
Hand Adaptive ON 27 1.101 0.0198
Gaze Static OFF 27 1.169 0.1844
Gaze Static ON 27 1.182 0.2067
Gaze Adaptive OFF 27 1.235 0.1840
Gaze Adaptive ON 27 1.242 0.1809

Learning Curves: Movement Time Within Condition. N = 81 participants. Learning aligned by position within condition (accounting for counterbalancing). LOESS smoothing. Lower is better. Shaded regions show 95% CI.
Error Rate Summary by Condition
Modality UI Mode Pressure N Positions Mean Error Rate Min Error Rate Max Error Rate
Hand Static OFF 27 1.88% 0.00% 4.11%
Hand Static ON 27 1.59% 0.00% 4.05%
Hand Adaptive OFF 27 1.55% 0.00% 5.41%
Hand Adaptive ON 27 1.98% 0.00% 8.00%
Gaze Static OFF 27 18.44% 11.54% 24.34%
Gaze Static ON 27 20.67% 10.13% 30.38%
Gaze Adaptive OFF 27 18.40% 13.75% 26.58%
Gaze Adaptive ON 27 18.09% 8.97% 28.21%

Learning Curves: Error Rate Within Condition. Learning aligned by position within condition (accounting for counterbalancing). LOESS smoothing. Lower is better. Shaded regions show 95% CI.

Note: Data aligned by position within condition to account for Williams counterbalancing. For block-level trends, see Section 12: Block Order & Temporal Effects.


11. Movement Quality Metrics

Submovement Analysis

Research Question: Does adaptive UI reduce movement corrections? How do submovements relate to performance?

Submovements indicate intermittent control - fewer submovements suggest smoother, more ballistic movements.

Planned Sample Size & Power

Submovement count is a noisier movement-quality metric and is currently based on pre-computed peaks. We anticipate small-to-medium effects of UI mode (adaptive reducing corrective movements) and medium effects of modality, but with considerable between-participant variability. For such count-based metrics, simulation-based power analysis is strongly recommended (e.g., using the approach in Kumle et al., 2021). As a rule of thumb, N = 64–72 would be needed to treat submovement differences as confirmatory (especially for UI-mode effects), whereas N = 48 is more appropriate for exploratory visualization and effect-size estimation rather than strict NHST.

Data Availability Note: Submovement metrics are available for a subset of the sample (see counts below). Results in this section are descriptive engineering diagnostics. - Participants with submovement_count (legacy, pre-computed): N = N = 3 - Participants with submovement_count_recomputed (from trajectory data): N = N = 71 - Participants with full trajectory JSON data: N = N = 71

Submovement Count by Condition (N = 71 participants, using submovement_count_recomputed)
modality ui_mode pressure N_participants N_trials Mean SD Median
hand static 0 63 1715 0.00 0.00 0
hand static 1 65 1748 0.00 0.00 0
hand adaptive 0 64 1738 0.00 0.00 0
hand adaptive 1 65 1758 0.00 0.00 0
gaze static 0 69 1545 8.51 4.76 8
gaze static 1 68 1501 8.68 4.62 8
gaze adaptive 0 70 1572 8.85 5.26 8
gaze adaptive 1 68 1540 9.06 5.04 8
ℹ **Note:** Hand modality shows zero submovements (smooth movements). Plot shows gaze modality only.

ℹ **Note:** Hand modality shows zero submovements, indicating very smooth movements
   with no detected velocity peaks (submovements). This is valid data.

ℹ **Note:** Hand modality shows zero submovements. Plot shows gaze modality only.

Submovements vs. Index of Difficulty. N = 74 participants. How movement corrections scale with task difficulty. Linear regression with 95% confidence intervals.

Verification Time Analysis

Research Question: How much time is spent “stopping” vs. “moving”? Does adaptive UI reduce verification time?

Sample Size: N = 81 participants with verification time data.

Planned Sample Size & Power

Verification time (from first target entry to final selection) is conceptually closer to a decision-phase measure and serves as a bridge to future LBA modeling. We again expect medium modality effects and small-to-medium UI-mode effects, and we analyze it via an LMM. Because this outcome is continuous and based on many trials per participant, N = 48 is a good target for medium effects, and N = 64 provides added stability for smaller UI-mode differences or more complex interaction patterns. The same repeated-measures power guidelines apply as for RT and TP (Cohen, 1988).

Verification time represents the “precise stopping” phase, separate from the ballistic movement phase.


### Target Dwell Time


### Verification Phase Decomposition
Confirmation Event Source by Condition. What triggered the final confirmation?
modality ui_mode pressure confirm_event_source N_trials Pct
hand static 0 click 1458 12.1
hand static 1 click 1469 12.2
hand adaptive 0 click 1485 12.3
hand adaptive 1 click 1483 12.3
gaze static 0 space 1545 12.8
gaze static 1 space 1501 12.5
gaze adaptive 0 space 1572 13.0
gaze adaptive 1 space 1540 12.8

12. Error Patterns & Types

Research Question: What types of errors occur? Do error patterns differ by condition?

Sample Size: N = 81 participants with error type data.


**Error Type Summary:** Overall error rates were  19 % for gaze and  1.7 % for hand. Error patterns differed substantially by modality: gaze errors were predominantly slips ( 99 %), while hand errors were predominantly misses ( 93.1 %). This pattern is consistent with the modality characteristics—gaze is more prone to accidental selections (slips) due to the Midas touch problem, while hand pointing is more prone to missing targets. Adaptive UI did not yet show a clear reduction in any specific error type at N= 81 .

13. Block Order & Temporal Effects

Research Question: Are there order effects? Does performance improve or degrade over blocks?

Sample Size: N = 81 participants with block-level data.

Note: This section is exploratory/QC only. These analyses serve as quality checks for temporal trends and are not treated as primary inferential outcomes.

Performance Across Blocks: Movement Time. Movement time by block number. Lower is better. Shaded regions show ±1 SE.
Block-Level Data Summary by Condition
Modality UI Mode Pressure N Blocks Mean Error Rate
Hand Static OFF 8 1.81%
Hand Static ON 8 1.44%
Hand Adaptive OFF 8 1.38%
Hand Adaptive ON 8 1.90%
Gaze Static OFF 8 18.27%
Gaze Static ON 8 20.71%
Gaze Adaptive OFF 8 18.66%
Gaze Adaptive ON 8 18.55%

Performance Across Blocks: Movement Time. Movement time by block number. Lower is better. Shaded regions show ±1 SE.

Performance Across Blocks: Error Rate. Error rate by block number. Lower is better. Shaded regions show ±1 SE.

14. Spatial Patterns & Heatmaps

Research Question: Are there spatial biases in performance? Do some screen regions show better/worse performance? Do error patterns differ between conditions?

Sample Size: N = 81 participants with spatial position data.

Note: This section includes both descriptive visualizations and inferential statistical tests. At N=81, spatial analyses provide insights into XR-specific patterns (e.g., top vs bottom of visual field) and condition differences.

Performance by Target Position


### Statistical Tests: Spatial Position Effects

**Movement Time by Screen Region (Horizontal):**


Table: ANOVA: RT ~ Screen Region × Modality × UI Mode

|                               |  Sum Sq| Mean Sq| NumDF|    DenDF|  F value| Pr(>F)|
|:------------------------------|-------:|-------:|-----:|--------:|--------:|------:|
|screen_region                  |  0.2125|  0.1062|     2| 14940.86|   1.2412| 0.2891|
|modality                       | 15.2025| 15.2025|     1| 14892.98| 177.5954| 0.0000|
|ui_mode                        |  1.1998|  1.1998|     1| 14862.26|  14.0157| 0.0002|
|screen_region:modality         |  0.0411|  0.0206|     2| 14870.82|   0.2403| 0.7864|
|screen_region:ui_mode          |  0.0224|  0.0112|     2| 14862.94|   0.1311| 0.8771|
|modality:ui_mode               |  1.3645|  1.3645|     1| 14862.33|  15.9398| 0.0001|
|screen_region:modality:ui_mode |  0.1912|  0.0956|     2| 14863.17|   1.1169| 0.3273|

**Screen region effect:**   
**Pairwise Comparisons (Holm-adjusted):**


|contrast       |modality |ui_mode  | estimate|     SE|p-value |  df|
|:--------------|:--------|:--------|--------:|------:|:-------|---:|
|Left - Center  |hand     |static   |   0.0117| 0.0115|= 0.712 | Inf|
|Left - Right   |hand     |static   |   0.0145| 0.0123|= 0.712 | Inf|
|Center - Right |hand     |static   |   0.0028| 0.0123|= 0.822 | Inf|
|Left - Center  |gaze     |static   |  -0.0016| 0.0121|= 1.000 | Inf|
|Left - Right   |gaze     |static   |   0.0045| 0.0133|= 1.000 | Inf|
|Center - Right |gaze     |static   |   0.0061| 0.0132|= 1.000 | Inf|
|Left - Center  |hand     |adaptive |  -0.0065| 0.0114|= 0.566 | Inf|
|Left - Right   |hand     |adaptive |   0.0137| 0.0123|= 0.531 | Inf|
|Center - Right |hand     |adaptive |   0.0202| 0.0122|= 0.298 | Inf|
|Left - Center  |gaze     |adaptive |   0.0142| 0.0120|= 0.708 | Inf|
|Left - Right   |gaze     |adaptive |   0.0145| 0.0132|= 0.708 | Inf|
|Center - Right |gaze     |adaptive |   0.0003| 0.0130|= 0.984 | Inf|

**Error Rate by Screen Region (Horizontal):**


Table: ANOVA: Error Rate ~ Screen Region × Modality × UI Mode

|                               | npar|   Sum Sq|  Mean Sq|  F value|
|:------------------------------|----:|--------:|--------:|--------:|
|screen_region                  |    2|   8.5883|   4.2942|   4.2942|
|modality                       |    1| 709.3786| 709.3786| 709.3786|
|ui_mode                        |    1|   2.0438|   2.0438|   2.0438|
|screen_region:modality         |    2|  41.8315|  20.9158|  20.9158|
|screen_region:ui_mode          |    2|   0.3814|   0.1907|   0.1907|
|modality:ui_mode               |    1|   0.5968|   0.5968|   0.5968|
|screen_region:modality:ui_mode |    2|   4.4252|   2.2126|   2.2126|

**Screen region effect:**   

**Movement Time by Screen Region (Vertical):**


Table: ANOVA: RT ~ Vertical Region × Modality × UI Mode

|                                 |  Sum Sq| Mean Sq| NumDF|    DenDF|  F value| Pr(>F)|
|:--------------------------------|-------:|-------:|-----:|--------:|--------:|------:|
|vertical_region                  | 20.9956| 10.4978|     2| 14905.33| 124.6761| 0.0000|
|modality                         | 16.3887| 16.3887|     1| 14892.72| 194.6388| 0.0000|
|ui_mode                          |  1.2424|  1.2424|     1| 14862.14|  14.7548| 0.0001|
|vertical_region:modality         |  0.1022|  0.0511|     2| 14867.86|   0.6068| 0.5451|
|vertical_region:ui_mode          |  0.0955|  0.0478|     2| 14863.81|   0.5674| 0.5670|
|modality:ui_mode                 |  1.5437|  1.5437|     1| 14862.29|  18.3333| 0.0000|
|vertical_region:modality:ui_mode |  0.1166|  0.0583|     2| 14863.96|   0.6926| 0.5003|

**Vertical region effect:**   

Error Density Heatmap

Where do endpoint errors occur? Are there systematic spatial biases?


### Statistical Tests: Error Density Differences

**Error Distance (Magnitude) Comparison:**


Table: ANOVA: Error Distance ~ UI Mode

|        | Sum Sq| Mean Sq| NumDF|    DenDF| F value| Pr(>F)|
|:-------|------:|-------:|-----:|--------:|-------:|------:|
|ui_mode | 0.0444|  0.0444|     1| 6859.323|  0.1156| 0.7339|
**Estimated Marginal Means (Error Distance, px):**


|ui_mode  | response|    SE|  df| asymp.LCL| asymp.UCL|
|:--------|--------:|-----:|---:|---------:|---------:|
|static   |    9.378| 0.200| Inf|     8.993|     9.779|
|adaptive |    9.431| 0.201| Inf|     9.045|     9.832|

⚠ Could not fit error distance model: ℹ In argument: `t-ratio = round(t.ratio, 2)`.
Caused by error:
! object 't.ratio' not found 

**Error Bias in X-Direction:**


Table: ANOVA: Error X ~ UI Mode

|        |  Sum Sq| Mean Sq| NumDF|   DenDF| F value| Pr(>F)|
|:-------|-------:|-------:|-----:|-------:|-------:|------:|
|ui_mode | 28.7645| 28.7645|     1| 6882.32|  0.3012| 0.5831|
**Estimated Marginal Means (Error X, px):**


|ui_mode  | emmean|    SE|  df| asymp.LCL| asymp.UCL|
|:--------|------:|-----:|---:|---------:|---------:|
|static   |  0.546| 0.208| Inf|     0.138|     0.955|
|adaptive |  0.417| 0.207| Inf|     0.011|     0.822|

**X-direction bias:**   

**Error Bias in Y-Direction:**


Table: ANOVA: Error Y ~ UI Mode

|        |  Sum Sq| Mean Sq| NumDF|   DenDF| F value| Pr(>F)|
|:-------|-------:|-------:|-----:|-------:|-------:|------:|
|ui_mode | 71.9155| 71.9155|     1| 6875.42|  0.8379|   0.36|
**Estimated Marginal Means (Error Y, px):**


|ui_mode  | emmean|    SE|  df| asymp.LCL| asymp.UCL|
|:--------|------:|-----:|---:|---------:|---------:|
|static   |  1.367| 0.222| Inf|     0.932|     1.803|
|adaptive |  1.572| 0.221| Inf|     1.139|     2.005|

**Y-direction bias:**   

**Kolmogorov-Smirnov Test: Error Distance Distributions**
D = 0.0101, p = 0.9945
○ No significant difference in distributions

**Note:** 2D spatial pattern comparisons (e.g., 2D KS test) would require specialized packages.
Current analysis focuses on univariate comparisons (distance, X-bias, Y-bias).

15. Adaptive UI Mechanism Analysis

Root-Cause Diagnostic: Width Scaling Non-Activation

Research Question: Why did hand width inflation (width_scale_factor) fail to activate? Was this due to strict thresholds/gates or a bug/misconfiguration?

### Diagnostic: Why Did Hand Width Scaling Not Activate?
**Trigger-related columns in df_raw:**
- adaptation_triggered
- timeout_triggered
- width_scale_factor
- alignment_gate_false_triggers
- debrief_q1_adaptation_noticed 

**Trigger summary for HAND/Adaptive/Pressure ON:**

- adaptation_triggered :


|pid  |unique_vals | n_non_na|
|:----|:-----------|--------:|
|P001 |FALSE       |       27|
|P002 |FALSE       |       27|
|P003 |FALSE       |       27|
|P004 |FALSE       |       27|
|P005 |FALSE       |       27|
|P006 |FALSE       |       27|
|P007 |FALSE       |       27|
|P008 |FALSE       |       27|
|P009 |FALSE       |       27|
|P010 |FALSE       |       27|

- timeout_triggered :


|pid  |unique_vals | n_non_na|
|:----|:-----------|--------:|
|P001 |FALSE       |       27|
|P004 |FALSE       |       27|
|P005 |FALSE       |       27|
|P006 |FALSE       |       27|
|P011 |FALSE       |       27|
|P012 |FALSE       |       27|
|P013 |FALSE       |       27|
|P014 |FALSE       |       27|
|P016 |FALSE       |       27|
|P017 |FALSE       |       27|

- width_scale_factor :


|pid  | mean_val| median_val| max_val| pct_nonzero|
|:----|--------:|----------:|-------:|-----------:|
|P001 |        1|          1|       1|         100|
|P002 |      NaN|         NA|    -Inf|         NaN|
|P003 |      NaN|         NA|    -Inf|         NaN|
|P004 |        1|          1|       1|         100|
|P005 |        1|          1|       1|         100|
|P006 |        1|          1|       1|         100|
|P007 |      NaN|         NA|    -Inf|         NaN|
|P008 |      NaN|         NA|    -Inf|         NaN|
|P009 |        1|          1|       1|         100|
|P010 |        1|          1|       1|         100|

- alignment_gate_false_triggers :


|pid  | mean_val| median_val| max_val| pct_nonzero|
|:----|--------:|----------:|-------:|-----------:|
|P001 |     0.37|          0|       2|       33.33|
|P002 |      NaN|         NA|    -Inf|         NaN|
|P003 |      NaN|         NA|    -Inf|         NaN|
|P004 |     0.22|          0|       1|       22.22|
|P005 |     0.07|          0|       1|        7.41|
|P006 |      NaN|         NA|    -Inf|         NaN|
|P007 |      NaN|         NA|    -Inf|         NaN|
|P008 |      NaN|         NA|    -Inf|         NaN|
|P009 |      NaN|         NA|    -Inf|         NaN|
|P010 |      NaN|         NA|    -Inf|         NaN|

- debrief_q1_adaptation_noticed :


|pid  |unique_vals                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | n_non_na|
|:----|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------:|
|P001 |I noticed size differences, but no dynamic growth. I did not notice any text change.                                                                                                                                                                                                                                                                                                                                                                                               |       27|
|P002 |I saw that the interface and targets changed, but I did not realize that it was a result of my performance.                                                                                                                                                                                                                                                                                                                                                                        |       27|
|P003 |No                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |       27|
|P006 |sometimes they got bigger                                                                                                                                                                                                                                                                                                                                                                                                                                                          |       27|
|P007 |No                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |       27|
|P013 |I noticed the jitter and drift. At first, I thought this is a design flaw! Then I realized that this has been planted for a reason!                                                                                                                                                                                                                                                                                                                                                |       27|
|P015 |I did not notice any change                                                                                                                                                                                                                                                                                                                                                                                                                                                        |       27|
|P016 |I noticed the targets change size over the trials, but did not realize they were changing in response to my performance.                                                                                                                                                                                                                                                                                                                                                           |       27|
|P019 |No                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |       27|
|P021 |I didn't really notice the interface change, specifically because I made mistakes or no, however I primarily focused on the gaze model (the jitter portion where I was aiming) to see if it changed or not, due to whether I made more mistakes.  I was unable to figure out if it did change or not, though.  One thing I noticed was that sometimes the input initially would tell me I would be using the hand mode, but then instead I would use the gaze mode and vice versa. |       27|

**Scaling following triggers:**


|adaptation_triggered |timeout_triggered | width_scale_factor| alignment_gate_false_triggers|debrief_q1_adaptation_noticed                                                                                                                                                                                                                                                                                                                                                                                                                                                      | n_trials| mean_width_scale| pct_scaled|
|:--------------------|:-----------------|------------------:|-----------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------:|----------------:|----------:|
|FALSE                |FALSE             |                  1|                             0|During Gaze Mode it was hard to track the target due to bloom (circles) appearing continuously as I tried to hit the targets.                                                                                                                                                                                                                                                                                                                                                      |       24|                1|          0|
|FALSE                |FALSE             |                  1|                             0|I constantly tried to get the objects. The objects get 2 different shapes and variable sizes. I tried to catch the objects. Sometimes, rectangular objects also moved away from mouse.                                                                                                                                                                                                                                                                                             |       23|                1|          0|
|FALSE                |FALSE             |                  1|                             0|I did not notice                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |       25|                1|          0|
|FALSE                |FALSE             |                  1|                             0|I did not notice the interface changing.                                                                                                                                                                                                                                                                                                                                                                                                                                           |       27|                1|          0|
|FALSE                |FALSE             |                  1|                             0|I did notice that sometimes targets were inflated, but didn't figure out that the inflation would be related to my prior responses. 
I did not notice any changes related to the decluttered interface.                                                                                                                                                                                                                                                                             |       27|                1|          0|
|FALSE                |FALSE             |                  1|                             0|I didn't actually notice that the dot size for hand mode was connected to my performance. For gaze mode, I noticed an orange shape sometimes, which occasionally delayed my reaction time, but didn't notice the text dimming that was mentioned above.                                                                                                                                                                                                                            |       25|                1|          0|
|FALSE                |FALSE             |                  1|                             0|I didn't make that many mistakes, so I didn't feel the change as much...                                                                                                                                                                                                                                                                                                                                                                                                           |       27|                1|          0|
|FALSE                |FALSE             |                  1|                             0|I didn't notice the interface changing but I did notice the start button shifting each time.                                                                                                                                                                                                                                                                                                                                                                                       |       25|                1|          0|
|FALSE                |FALSE             |                  1|                             0|I didn't really notice a pattern adapting to my performance - maybe it was because I wasn't really sure how I was doing until the end                                                                                                                                                                                                                                                                                                                                              |       20|                1|          0|
|FALSE                |FALSE             |                  1|                             0|I didn't really notice the interface change, specifically because I made mistakes or no, however I primarily focused on the gaze model (the jitter portion where I was aiming) to see if it changed or not, due to whether I made more mistakes.  I was unable to figure out if it did change or not, though.  One thing I noticed was that sometimes the input initially would tell me I would be using the hand mode, but then instead I would use the gaze mode and vice versa. |       25|                1|          0|

Width Scaling (Target Size Adaptation)

Research Question: Does the adaptive UI dynamically change target sizes? How does width scaling relate to performance?

Sample Size: N = 74 participants with width scaling data.

Status: In the current dataset, the width scaling mechanism was disabled/misconfigured; all recorded width_scale_factor values equal 1.0. Results here serve as a template for future analysis once scaling is active.

The adaptive UI may scale target widths based on performance. This section examines whether and how target sizes are adjusted.

**Note:** No target width scaling was observed in this dataset.
All `width_scale_factor` values are 1.0 (no scaling applied).

This indicates that the adaptive policy did not trigger during data collection.
Possible reasons:
- Hysteresis gate threshold not met (requires N consecutive slow/error trials)
- Performance thresholds (RT p75, error burst) not exceeded
- Adaptive policy not properly configured or enabled
- Participants performed well enough that adaptation was not needed
Target Width Scaling by Condition (N = 74 participants, No Scaling Observed)
modality ui_mode pressure N_participants N_trials Mean_Scale SD_Scale Mean_Diff SD_Diff Pct_Scaled
hand static 0 72 1971 1 0 0 0 0
hand static 1 74 2025 1 0 0 0 0
hand adaptive 0 73 1998 1 0 0 0 0
hand adaptive 1 74 2025 1 0 0 0 0
gaze static 0 72 1971 1 0 0 0 0
gaze static 1 73 1998 1 0 0 0 0
gaze adaptive 0 74 2025 1 0 0 0 0
gaze adaptive 1 71 1944 1 0 0 0 0

**Note:** Width scale factor plot is not shown because all values are 1.0 (no scaling occurred).
Showing a plot of constant values would not be informative. The adaptive policy did not trigger during data collection.

**Note:** Width scaling over time plot is not shown because all scale factors are 1.0 (no scaling occurred).
Showing a plot of constant values would not be informative. The adaptive policy did not trigger during the experiment.

**Note:** Width scale factor vs. performance plot is not shown because all width scale factors are 1.0 (no scaling occurred).

**Why this matters:** This plot would show whether larger targets (scale factor > 1.0) improve performance by reducing movement time.
However, since the adaptive policy did not trigger during data collection, all targets remained at their nominal size.
As a result, there is no variation in the width scale factor, making it impossible to assess the performance relationship.

**Possible reasons for no scaling:**
- Hysteresis gate threshold not met (requires N consecutive slow/error trials)
- Performance thresholds (RT p75, error burst) not exceeded
- Participants performed well enough that adaptation was not needed
- Adaptive policy not properly configured or enabled

Alignment Gate Metrics

Research Question: If alignment gates are used, how do they affect performance? How often are false triggers detected?

Alignment gates may be used to ensure proper cursor alignment before selection. This section examines their usage and effectiveness.


**Alignment Gate Interpretation:** False triggers were  rare  (mean =  0.06  per trial).  Adaptive UI did not show a meaningful change in false trigger rate compared to Static. 
ℹ **Note:** Gaze modality shows zero false triggers (alignment gates are hand-only). Plot shows hand modality only.

Alignment Gate False Triggers by Condition. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons. Lower is better.
ℹ **Note:** No recovery time data for gaze modality.
   This indicates the alignment gate always passed (no false triggers) for these trials.

Alignment Gate Recovery Time by Condition. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons. Lower is better.
ℹ **Note:** No mean recovery time data for gaze modality.
   This indicates the alignment gate always passed (no false triggers) for these trials.

Alignment Gate Mean Recovery Time by Condition. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons. Lower is better.

Task Type Analysis

Research Question: Are there different task types (point vs. drag)? How does performance differ across task types?

If the experiment includes different task types, this section examines performance differences.

Performance by Task Type
task_type modality ui_mode N_Trials Mean_RT SD_RT Error_Rate
drag hand static 2664 1101.0 475.0 1.31
drag hand adaptive 2682 1096.7 466.0 1.53
drag gaze static 2553 1203.3 582.1 17.04
drag gaze adaptive 2592 1228.9 539.7 16.40
point hand static 1332 1085.5 435.9 1.35
point hand adaptive 1341 1094.1 442.1 1.12
point gaze static 1279 1194.2 541.5 16.73
point gaze adaptive 1289 1244.8 529.7 16.06

Movement Time by Task Type. Raincloud plot: half-violins with boxplots inside, individual points. Comparison of performance across different task types (if multiple exist). Lower is better.

Planned Sample Size & Power

Path-length efficiency (actual path / straight-line amplitude) is analyzed at the trial level but interpreted as a within-subject continuous outcome, with expected medium modality differences (longer, less efficient paths for gaze) and small-to-medium UI-mode effects. We treat N = 48 as a reasonable “good N” for detecting medium effects (dz ≈ 0.4–0.5), and N = 64 as an ideal target if path efficiency becomes more central to the argument. At both Ns, this analysis is secondary to the core throughput and RT results.

Path Length and Efficiency Metrics by Condition
modality ui_mode pressure N_Trials Mean_Path_Length Mean_Amplitude Mean_Ratio Mean_Efficiency Mean_Excess Mean_RT
hand static 0 1624 715.5 371.2 2.28 0.535 344.3 1050.3
hand static 1 1657 717.2 373.0 2.28 0.536 344.2 1068.6
hand adaptive 0 1661 717.2 373.6 2.28 0.538 343.6 1063.9
hand adaptive 1 1666 715.4 371.8 2.27 0.535 343.6 1064.0
gaze static 0 1339 616.8 351.2 1.92 0.588 265.6 1175.5
gaze static 1 1332 644.0 355.0 1.98 0.575 289.0 1171.1
gaze adaptive 0 1358 632.5 351.3 1.95 0.583 281.2 1225.9
gaze adaptive 1 1343 681.9 356.2 2.07 0.554 325.7 1232.9

Path Length vs. Movement Time (Log-Log Scale). 2D density plot showing the relationship between actual cursor path length and movement time. GAM smooth captures nonlinearity. Log scales handle right-skewed distributions and heteroscedasticity.

Path Efficiency vs. Movement Time. Path efficiency (A / path length) indicates how straight the movement was. Higher efficiency (closer to 1.0) means straighter paths. This plot shows whether inefficient movements lead to longer movement times, and whether adaptive UI improves efficiency.
⚠ Cannot create ID bins: insufficient variation or invalid break points.
Skipping ID binning plot.

Individual Differences in Path Efficiency. Thin lines show per-participant mean efficiency by UI mode. Thick line and large point show condition mean. Shows whether adaptive UI consistently improves efficiency across participants.

16. Gaze-Specific Analysis: Hover/Dwell Time

Research Question: How does hover/dwell time vary across gaze conditions? Does adaptive UI affect dwell time before confirmation?

Planned Sample Size & Power

Hover/dwell time is modeled only for gaze trials with fixed effects for UI mode and pressure. Because this shrinks the effective dataset and the expected UI-mode effects may be small-to-medium (dz ≈ 0.3–0.5), we treat this analysis as exploratory unless N ≥ 64. At N = 48, the study is adequately powered for medium effects but underpowered for smaller ones; at N = 64, we expect ≈0.80 power even if the UI-mode effect is closer to dz ≈ 0.35, based on standard repeated-measures calculations and mixed-model heuristics (Cohen, 1988; Kumle et al., 2021).

Sample Size: N = 0 (no data) participants with gaze hover/dwell data.

Hover/dwell time represents the duration the cursor remains in the target before confirmation in gaze trials. This metric is specific to gaze modality and reflects the “Midas touch” problem—the need for deliberate confirmation to avoid unintended selections.

⚠ No valid hover/dwell time data available for gaze trials.

Statistical Analysis: Hover/Dwell Time

⚠ Insufficient data for Hover/Dwell Time statistical tests.

  • Hierarchical LBA (verification-time RTs) - see Section 16
  • Control-theory kinematics (velocity profiles, submovement decomposition) - see Section 17

Implementation Notes: - LBA requires RT data from the verification phase (time from target entry to selection) - Model fitting can be done using RWiener or rtdists packages - Key parameters to estimate: drift rate (v), threshold (b), starting point (A), non-decision time (t0) - Hypothesis: Adaptive conditions should show lower threshold (b), indicating less caution needed


17. Linear Ballistic Accumulator (LBA) Analysis

Research Question: Can we model the verification phase (time from target entry to selection) using LBA parameters? Do adaptive conditions show different decision thresholds?

Linear Ballistic Accumulator models decompose reaction time into decision and non-decision components. For gaze-based interaction, we hypothesize that adaptive UI reduces decision threshold (b), indicating less caution needed when targets are easier to acquire.

Sample Size & Power

The hierarchical LBA analysis is run on verification-time RTs with parameters (v, b, A, t₀) varying by modality and UI mode. Power and parameter recovery in diffusion/accumulator models depend more on trials per participant than on sheer N, but group-level comparisons still require a sufficient number of participants. Studies on parameter recovery for DDM/LBA and related models generally recommend ≥100 trials per condition and at least 30–40 participants for stable hierarchical estimates. Our design (≈24 trials × 8 conditions ≈ 192 trials per participant) is strong on the trial side. For group-level parameter differences, a target of N ≥ 64 is advisable for narrower credible intervals on parameter contrasts.

**LBA Analysis Results**

Parameters estimated using hierarchical Bayesian LBA model (PyMC).

**LBA Parameters by Modality and UI Mode:**



Table: LBA Parameter Estimates

|Modality |UI_Mode  |  t0_mu| vc_base_mu| vc_slope_mu| gap_int_mu| gap_slope_mu|  ve_mu|
|:--------|:--------|------:|----------:|-----------:|----------:|------------:|------:|
|hand     |static   | -0.565|     -3.673|      -1.846|     -0.619|        0.098| -4.952|
|hand     |adaptive | -0.588|     -3.673|      -1.846|     -0.619|        0.098| -4.952|
|gaze     |static   | -0.591|     -3.673|      -1.846|     -0.619|        0.098| -4.952|
|gaze     |adaptive | -0.561|     -3.673|      -1.846|     -0.619|        0.098| -4.952|

**Model Diagnostics:**
- MCMC trace saved to: `lba_trace.nc`
- Trace plots available: `lba_trace_plot.png`
- Parameter summary: `lba_parameters_summary.csv`

**Note:** Review trace plots and R-hat diagnostics to assess convergence.

**Parameter Interpretation:**
- **t0 (non-decision time):** Time for stimulus encoding and motor execution, varies by modality and UI mode
- **vc_base (drift rate base):** Baseline accumulation rate for correct responses
- **vc_slope (drift rate slope):** How drift rate changes with task difficulty (ID)
- **gap_int (threshold gap intercept):** Baseline decision threshold above start point
- **gap_slope (threshold gap slope):** How threshold changes with pressure (speed-accuracy tradeoff)
- **ve (error drift rate):** Accumulation rate for error responses

18. Control Theory Analysis: Submovement Models

Research Question: How does the control loop efficiency differ across conditions? Do adaptive interventions reduce movement corrections?

Sample Size & Power

Trajectory-based kinematic metrics (velocity profiles, jerk, normalized jerk, primary vs corrective phases) are rich but correlated and often noisier than basic RT/TP measures. Because they are derived from the same trial-level data, their within-subject effect sizes are likely small-to-medium, with substantial individual differences. For these analyses, N = 64 is a good target for stronger inferential claims about UI-mode improvements in movement smoothness or control-loop efficiency. As with LBA, simulation-based power analyses tailored to your specific metrics would be ideal but are beyond the scope of this report (Kumle et al., 2021).

Submovement metrics in this report include pre-computed submovement_count (see Section 10). Full trajectory-based control-theory models (jerk, duration-normalized jerk, primary vs corrective phases) can be implemented using trajectory logging data.

The Optimized Submovement Model [@meyer1988] posits that pointing movements are composed of a primary ballistic impulse followed by n corrective submovements. The Submovement Count (N_sub) serves as a proxy for the efficiency of the control loop. In gaze-based interaction, simulated lag and saccadic blindness force users into an intermittent control regime, theoretically increasing N_sub.

Power Analysis Summary: - N=64 target provides good power for medium main effects (dz≈0.41, power≈0.80) - Interactions may be underpowered unless large (treat as exploratory) - 60fps trajectory data improves measurement precision but doesn’t increase effective N - Key considerations: Use duration-normalized smoothness metrics, control for multiple comparisons (FDR), pre-specify outcomes - See POWER_ANALYSIS_EXPERT_RESPONSE.md for detailed recommendations

✅ **Trajectory Data Available**

**Note:** Full trajectory processing requires parsing JSON and computing derivatives.
For this report, we use pre-computed `submovement_count` from Section 10.
Advanced trajectory processing (velocity profiles, jerk) can be implemented
using the `analysis/r/process_trajectory.R` script for detailed analyses.

- N trials with trajectory: 15309 
- Participants: 71 

**Current Analysis:** Using pre-computed metrics from Section 10.
**Future Enhancement:** Full trajectory processing available in `analysis/r/process_trajectory.R`

Submovement Count (Control Loop Efficiency)

Submovement Count by Condition (N = 71 participants, using submovement_count_recomputed)
modality ui_mode pressure N_participants N_trials Mean Submovements SD Submovements Median Submovements
hand static 0 63 1715 0.00 0.00 0
hand static 1 65 1748 0.00 0.00 0
hand adaptive 0 64 1738 0.00 0.00 0
hand adaptive 1 65 1758 0.00 0.00 0
gaze static 0 69 1545 8.51 4.76 8
gaze static 1 68 1501 8.68 4.62 8
gaze adaptive 0 70 1572 8.85 5.26 8
gaze adaptive 1 68 1540 9.06 5.04 8

Interpretation: Lower submovement counts indicate smoother, more ballistic movements. Adaptive UI is expected to reduce corrective submovements by expanding targets.

ℹ **Note:** Hand modality shows zero submovements (smooth movements). Plot shows gaze modality only.

Submovement Count (Control Loop Efficiency) by Modality and UI Mode. N = 71 participants. Lower values indicate smoother, more ballistic movements.

Statistical Model: Submovement Count

### Model: Submovement Count

**Note:** Hand modality shows zero submovements (smooth, ballistic movements).
Gaze modality shows  8.8  submovements on average.
Modeling gaze-only data to test UI mode and pressure effects.

**Model:** log(submovement_count + 1) ~ ui_mode * pressure + (1 | pid) [Gaze modality only]

**Data Summary:**  71  participants,  6158  trials (gaze only).

**Rationale:** Hand modality shows zero submovements (smooth movements), so analysis focuses on gaze where submovements are present.

#### ANOVA Table
Type III Analysis of Variance Table with Satterthwaite's method
                  Sum Sq Mean Sq NumDF  DenDF F value  Pr(>F)  
ui_mode          0.37081 0.37081     1 6089.6  3.2036 0.07352 .
pressure         0.73574 0.73574     1 6087.8  6.3564 0.01172 *
ui_mode:pressure 0.07272 0.07272     1 6093.6  0.6283 0.42801  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Written Results (APA Style)

**Modality Comparison (Descriptive):** Hand input produced zero submovements across all conditions, indicating smooth, ballistic movements. Gaze input produced  8.8  submovements on average (SD =  4.9 ), consistent with intermittent control due to lag and saccadic blindness.

**UI Mode Effect (Gaze):** A linear mixed-effects model on log-transformed submovement count for gaze modality revealed a non-significant main effect of UI mode, F(1, 6089.6) = 3.20, p = 0.074, η²p = 0.001 (negligible effect). 
Adaptive UI did not reduce submovements (M = 7.72) compared to Static UI (M = 7.58) in gaze modality.

**Pressure Effect (Gaze):** The main effect of pressure on submovement count was significant, F(1, 6087.8) = 6.36, p = 0.012, η²p = 0.001 (negligible effect).

**UI Mode × Pressure Interaction:** The interaction was non-significant, F(1, 6093.6) = 0.63, p = 0.428, η²p = 0.000 (negligible effect).

Implementation Notes: - Basic submovement analysis is already in Section 10 (Movement Quality Metrics) - Trajectory data is now available in the trajectory column (JSON string, logged at ~60fps) - Current submovement_count is pre-calculated in FittsTask.tsx using velocity peaks - Power: N=48 sufficient for main effects (dz≈0.41, power≈0.80); interactions underpowered (treat as exploratory) - Key considerations: - Use duration-normalized smoothness metrics (jerk is duration-sensitive) - Control for multiple comparisons (FDR) if testing many kinematic features - Pre-specify a small set of theoretically motivated outcomes - 60fps improves measurement precision but doesn’t increase effective N - See POWER_ANALYSIS_EXPERT_RESPONSE.md for detailed power analysis and recommendations

Potential Issues to Check: - Verify that submovement_count calculation in FittsTask.tsx matches the Optimized Submovement Model definition - Check if velocity profile data is needed or if pre-calculated counts are sufficient - Ensure submovement detection algorithm handles both hand and gaze modalities correctly


19. Summary & Conclusions

Key Findings Summary

Summary of Key Metrics by Condition (N=81)
modality ui_mode Metric Mean SD
hand static Effective Width (px) 33.400 20.600
hand adaptive Effective Width (px) 34.110 21.200
gaze static Effective Width (px) 35.780 19.670
gaze adaptive Effective Width (px) 35.580 19.510
hand static Error Rate (%) 1.710 12.960
hand adaptive Error Rate (%) 1.740 13.080
gaze static Error Rate (%) 19.470 39.600
gaze adaptive Error Rate (%) 18.170 38.560
hand static Movement Time (s) 1.096 0.418
hand adaptive Movement Time (s) 1.090 0.406
gaze static Movement Time (s) 1.176 0.482
gaze adaptive Movement Time (s) 1.239 0.552
hand static Throughput (bits/s) 3.550 0.950
hand adaptive Throughput (bits/s) 3.530 0.950
gaze static Throughput (bits/s) 3.220 1.070
gaze adaptive Throughput (bits/s) 3.140 1.100

Data Quality Notes

  • Participants: 81
  • Valid Trials: 14953 (out of 17442 total experimental trials)
  • Exclusion Rate: 14% (due to errors, timeouts, or invalid RTs)
  • Trials per Participant: Mean = 184.6, Range = 82 - 406

Target Sample: N=64 participants for enhanced power in advanced analyses (LBA, control-theory kinematics).

Input Device Exclusion

Participants reported their input device during demographics collection (mouse, trackpad, or other). For hand modality analyses, we excluded trials from participants who used trackpads (n=6, 7.4% of sample), as trackpad vs mouse is a known confound in pointing task performance (MacKenzie & Jusoh, 2001; Karam et al., 2009). Trackpads have different acceleration curves, precision characteristics, and motor control requirements that can significantly affect hand modality performance.

For gaze modality analyses, we included all participants regardless of input device, as our physiologically-informed gaze simulation converts raw input (mouse or trackpad) into gaze-like coordinates with Gaussian jitter, lag, and saccadic suppression. Once converted, the input device should not affect the gaze interaction characteristics we’re measuring. This approach maximized data utilization while maintaining validity: hand modality comparisons use only mouse users (standardized device), while gaze modality comparisons use all participants (simulation-normalized input).

Exclusion impact: Approximately 648 hand trials from trackpad users were excluded (3.7% of total trials). All 648 gaze trials from trackpad users were retained. Hand modality analyses include N=75 participants (mouse users only); gaze modality analyses include N=81 participants (75 mouse + 6 trackpad users).

For detailed exclusion criteria, see EXCLUSION_CRITERIA.md and INPUT_DEVICE_EXCLUSION_STRATEGY.md. For technical audit details, see AUDIT_REPORT.md.

Report generated on 2026-01-17 00:17:59